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FINAL  PROGRESS  REPORT 


INTRODUCTION 

Recent  data  support  the  hypothesis  that  DNA  am  plification  plays  a  role  in 
establishing  the  malignant  cell  phenotype  in  cancer  (Nikolsky  et  al.,  2008).  However,  the 
basic  mechanism  underlying  DNA  amplificati  on  has  not  yet  been  elucidated,  though  it 
may  be  more  common  than  originally  th  ought  (Gomez  2008;  Gomez  and  Antequera 
2008).  There  appears  to  be  a  link  between  t  he  steroid  hormone  estrogen  and  many 
forms  of  breast  cancer,  but  the  detailed  me  chanism  is  unknown.  Estrogen  c  an  turn  on 
gene  expression  and  thus  activate  the  produc  tion  of  the  proteins  encoded  by  these 
genes.  Our  recent  results  in  a  model  syst  em  indicated  that  a  steroid  hor  mone  can 
induce  gene  amplification  in  wh  ich  re-replication  creates  extra  copies  of  the  gene.  This 
in  turn  will  also  increase  production  of  the  protein  encoded  by  the  amplified  gene. 
Hormonal  induction  of  gene  amp  lification  is  a  new  paradigm  for  how  hormones  work  , 
and  we  wish  to  see  if  it  applies  to  breast  c  ancer.  We  wish  to  examine  if  a  correlation 
exists  between  estrogen  receptor  (ER)  binding  at  novel  si  tes  in  the  breast  cancer 
genome  and  juxtaposition  with  r  eplication  origins  that  escape  normal  cellu  lar  controls 
and  re-replicate,  leading  to  DNA  amplification.  The  recent  obser  vations  that  estrogen 
induces  cell  proliferation  by  retention  of  MCM  proteins  in  the  nucleus  and  by  induction  of 
the  loading  factor  Cdtl  (P  an  et  al.,  2006)  support  our  hy  pothesis,  especially  sine  e 
increases  in  MCM  proteins  and  Cdtl  have  been  shown  to  induce  DN  A  amplification  in 
yeast  (Gopalakrishnan  et  al.,  2001 ;  Nguy  en  et  al.,  2001 ;  Green  et  al.,  2006)  an  d 
increased  Cdtl  results  in  re-replication  in  human  cells  (Dorn  et  al.,  2008).  The  N- 
terminus  of  Cdtl  is  im  portant  for  re-replication,  perhaps  through  interactions  with  PCNA 
and/or  cyclin  (Teerand  Dutta,  2008).  Cdtl  and  it  s  inhibitor  geminin  are  deregulated  in 
human  tumors  (Petropoulou  et  al.,  2008).  Mor  eover,  stalled  replic  ation  forks  and  DNA 
re-replication  lead  t  o  DNA  breakage  and  rearrangements  (Green  and  Li,  2005; 
Raveendranathan  et  al.,  2006;  Zhu  and  Dutta,  2006;  Dutta,  2007;  Hook  et  al.  2007) 
which  is  a  hallmark  of  cancer.  Our  research  may  provide  a  new  paradigm  for  hormon  al 
induction  of  breast  cancer  via  gene  amplific  ation,  leading  to  new  methods  of  diagnos  is 
and  treatment. 

BODY 

In  the  research  supported  by  this  gr  ant,  we  proposed  to  map  estrogen  receptor 
binding  sites,  origins  of  replication  and  regions  of  DNA  amplification  in  surgically  derived 
breast  cancer  tissue  (see  Appendix  1 :  DOD  meeting  abstract).  We  report  our  progress 
on  these  three  spec  ific  aims.  Also,  we  r  eport  on  r  elevant  recent  public  ations  that 
support  our  working  model.  The  P.l.  (Susan  Gerbi)  and  two  co-  P.I.s  (Alex  Brodsky  and 
Ben  Raphael)  meet  together  with  their  lab  personnel  roughly  once  a  month  to  review 
past  results  and  design  future  experimental  strategies. 

The  text  that  follows  is  the  revised  final  report  for  this  gran  t,  organized  according 
the  Statement  of  Work  that  was  listed  in  the  approv  ed  grant  applic  ation  which  was 
organized  according  t  o  the  three  Specific  Ai  ms.  Figures  and  tables  are  loc  ated  at  the 
end  of  the  document. 
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Statement  of  Work  Item  1:  Map  ER  binding  in  the  human  genome 

Months  1-6:  Work  out  the  methodology  (Gerbi  and  Brodsky  labs) 

Chromatin  from  Breast  Cancer  Tissue  -  this  is  the  starting  material  for  each  of  the 
three  specific  aims.  In  our  previous  pr  ogress  report  we  reported  the  difficulty  in 
obtaining  breast  cancer  tissue  samples,  as  recent  changes  in  c  linical  protocols  now 
result  in  the  vast  majority  of  patients  re  ceiving  chem  otherapy  prior  to  breast  cancer 
surgery.  We  cannot  use  tissue  derived  fr  om  patients  with  neoadj  uvent  therapy,  and 
therefore,  our  potent  ial  supply  of  material  was  drastically  r  educed.  As  reported 
previously,  we  expanded  our  network  of  clinical  collaborators  to  include  other  surgeon  s 
and  pathologists  at  both  R.l.  Ho  spital  and  Women  and  Infants  Hospital.  With  th  is 
increased  outreach,  we  have  now  obtained  some  surgically  derived  tissue  samples  and 
are  hoping  for  more  samples  (see  Table  1 ).  To  work  out  the  methodology  for  chromatin 
isolation  from  breast  cancer  tissue,  we  us  ed  material  from  the  R.l.  Hospital  breast 
cancer  tumor  bank  that  did  not  meet  our  crit  eria  above,  but  was  available  for  these  pilot 
experiments.  We  determined  that  tissue  th  at  was  freshly  obtained  and  frozen  from  a 
current  surgery  was  equivalent  to  tissue  that  had  been  stored  frozen  for  a  period  of 
time.  For  chromatin  immunoprecipitation  (  ChIP)  procedures,  the  tissue  is  subjected  to 
formaldehyde  fixation,  homogenized  and  then  sheared  by  sonication.  We  found  that  the 
breast  cancer  tissue  (portions  of  1 .0-1 .5  cm  tumor  specimens)  was  very  fibrous  and 
hard  to  break  open  by  standard  horn  ogenization,  resulting  in  lo  w  yields  of  chromatin. 
We  purchased  a  microhomogeneizer  (the  same  model  used  by  Dr.  Peggy  F  arnham  for 
her  breast  cancer  chromatin  studies)  to  us  e  for  breast  cancer  tissue  dis  ruption.  Our 
initial  results  were  promising.  W  e  were  ab  le  to  prepare  sonicated  chromatin  from  cell 
lines  and  tissues  using  the  microhomogeni  zer  averaging  less  than  500  bp  in  siz  e 
(Figure  1)  and  that  works  well  for  ChIP. 

Months  7-20:  Carry  out  ER  ChIP-chip  experiments 

(Brodsky,  Gerbi  and  Raphael  labs) 

ChIP  of  ER  binding  sites  in  the  br  east  cancer  genome  -  these  data  had  air  eady 
been  obtained  by  co-P.I.  Alex  Brodsky  for  MCF7  cultured  breast  cancer  cells  (Carroll  et 
al. ,  2005  and  2006),  and  we  planned  to  use  them  as  a  reference  source  as  we 
developed  the  ChIP  methodology  fo  r  breast  cancer  tissue.  In  the  grant  application  we 
proposed  to  do  ChIP-chip  (DNA  microarrays  of  chromatin  immunoprecipitated  samples). 
However,  that  method  has  now  been  superceded  by  ChIP-Seq  (DNA  sequencing  of 
chromatin  immunoprecipitated  DNA  samples).  Th  is  method  has  greater  sensitivity,  in 
addition  to  its  better  resoluti  on  than  ChIP-chip.  ChIP-Seq  revealed  10-13,000  ER-alpha 
binding  sites  in  the  genome  of  MCF7  breast  cancer  cells  (L  in  et  al.,  2007;  Fullwood  et 
al.,  2009;  Welboren  et  al.,  2009;  Hurtado  et  al.,  (201 1 ).  Be  cause  of  the  difficulty  in 
obtaining  tumor  sample  specimens  (see  preceding  section)  and  the  fact  that  none  of  the 
samples  we  received  were  FISH  pos  itive  for  HER2  gene  amplification  (  Table  1  ),  we 
decided  to  carry  out  our  experiments  MCF7  cultured  breast  cancer  cells  .  An  added 
bonus  of  this  cell  culture  model  system  is  th  at  the  ER-alpha  binding  sites  have  already 
been  mapped  by  s  everal  groups  by  the  mo  re  advanced  method  of  Chi  P-Seq  (se  e 
above),  obviating  the  need  for  us  to  map  these  sites. 
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Months  21-24:  confirm  ER  binding  site  candidates  (Gerbi  lab) 

This  was  no  longer  necessary  since  other  groups  have  already  mapped  ER- 
alpha  binding  sites  in  the  MCF7  breast  cancer  genome  (see  above).  The  decreased 
experimentation  needed  for  statement  of  work  item  1  al  lowed  us  to  devote  more  time  to 
develop  methodology  for  statement  of  work  item  2  (see  below). 

Statement  of  Work  Item  2:  Map  binding  sites  for  the  Origin  Recognition  Complex 
(ORC)  across  the  human  genome  (Brodsky,  Gerbi  and  Raphael  labs) 

Months  7-20:  Map  replication  origins  in  the  breast  cancer  genome 

As  stated  above,  bee  ause  of  the  pauc  ity  of  breast  cancer  tissue  material,  we 
decided  to  do  our  initial  experiments  to  map  replication  origins  on  the  well  studied  MCF7 
cell  line  where  the  ER  binding  sites  are  already  mapped.  This  also  has  the  advantage  of 
sample  homogeneity  which  woul  d  be  a  concern  for  tissue  from  breast  cancer  tumors. 
Our  plan  was  to  analy  ze  DNA  from  unsynchroniz  ed  cells  in  order  to  capture  all  origins 
regardless  of  when  during  S  phase  they  are  activated.  We  reported  in  previous  progress 
reports  that  we  obtained  polyclonal  anti  bodies  for  human  O  RC2  and  Cdtl  and  a 
monoclonal  antibody  against  human  ORC6  fr  om  Aloys  Schepers,  antibody  agains  t 
human  ORC1  from  Mel  DePamphilis  and  a  mammalian  expression  clone  for  FLAG- 
tagged  human  ORC1  from  Dr.  Kohji  Noguchi  (a  former  post-doc  with  Dr.  Mel 
DePamphilis).  The  Br  odsky  lab  checked  these  antibodies  by  W  estern  blots,  but  there 
was  high  background  with  mult  iple  bands.  Moreover,  two  tries  of  ChIP  with  ORC 
antibody  were  unsuccessful.  Discussions  t  hat  P.l.  Su  san  Gerbi  had  with  Drs.  Aloys 
Schepers  and  Michael  Leffak  at  the  Cold  Sp  ring  Harbor  DNA  Replication  Meeting 
revealed  that  ChIP  on  mammalian  cells  wit  h  ORC2  antibodies  has  a  high  background. 
Instead,  the  Brodsky  lab  considered  cloni  ng  ORC1  into  a  F  LAG-tag  vector  for 
transfection  into  MCF7  breast  cancer  cells  ,  reasoning  that  ChIP  with  a  FLAG  antibody 
should  give  better  results.  However,  we  decided  instead  to  pursue  the  more  promising 
approach  described  below. 

The  goal  is  to  map  DNA  replic  ation  or  igins  in  the  breast  cancer  genome.  A 
problem  with  the  approach  of  mapping  ORC  bi  nding  sites  is  that  ORC  also  binds  to 
silent  origins  that  are  not  us  ed,  so  we  would  not  know  which  or  igins  are  active  in  the 
breast  cancer  cells.  Moreover,  as  descr  ibed  abov  e,  there  are  problem  s  with  the 
antibodies  against  O  RC.  We  decided  that  superior  resu  Its  would  be  obtained  with  a 
more  successful  approach  to  isolate  small  nascent  DNA  to  map  replication  origin  s 
directly  by  sequencing  the  nascent  strands,  ra  ther  than  using  Chi  P-chip  or  ChIP-seq  to 
map  ORC  binding  sit  es.  The  short  nascent  strand  sequencing  approach  allows  us  to 
identify  by  this  direct  method  all  origins  t  hat  are  acti  ve  in  the  breast  cancer  genome. 
Nascent  strands  have  been  used  to  map  rep  lication  origins  for  a  limited  portion  (1% 
ENCODE  project)  of  the  hum  an  genome  (Lucas  et  al.,  2007;  Cadoret  et  al.,  2008)  and 
has  given  more  reliable  results  t  han  BrdU  labeling  of  non-lambda  exonuc  lease  treated 
DNA  (Birney  et  al.,  2007;  Karnani  et  al.,  2007)  where  results  from  the  latter  do  not  agree 
with  results  of  mapping  replication  bubbles  trapped  in  agarose  (Mesner  et  al.,  2010). 

In  order  to  identify  origin  s  of  DNA  replication  throug  hout  the  genome  of  MCF  7 
cells,  nasc  ent  DNA  was  prepared  according  to  our  previous  protocol  (Gerbi  and 
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Bielinsky,  1997).As  summarized  in  Figure  2,  genomic  DNA  was  prepared  from  mid-log 
phase  cells  using  DNAzol  (Invitrogen,  Calsbad,  CA)  and  resuspended  in  TE.  Replicative 
Intermediate  (Rl)  DNA  was  enriched  by  passi  ng  the  genomic  DNA  over  a  column  of 
BND-cellulose.  The  ends  of  the  Rl  DNA  were  phosphorylated  us  ing  T4  Polynucleotide 
Kinase  (New  England  Biolabs,  Ipswich,  MA  ).  Next  the  DNA  wa  s  digested  with  lambda 
exonuclease  to  enrich  nascent  strands  which  are  resist  ant  to  lambda-exonucleas  e 
digestion  due  to  the  presence  of  an  RNA  primer  at  their  5’  end.  Finally,  the  nascent 
strands  were  size  fractionated  (500  -  1  500  bp)  on  low  melting  point  agarose  to 

eliminate  background  from  Okazaki  fr  agments  whic  h  occur  throughout  the  genome. 
Enrichment  of  nascent  strands  was  co  nfirmed  by  real-time  PCR  as  saying  for 
enrichment  of  the  c-myc  origin  of  replication  (Tao  et  al.,  2000).  Pri  mers  were  design  ed 
to  be  spec  ific  to  loc  us  1 1  (the  c-myc  or  igin)  and  a  non-origin  s  equence  about  6  k  b 
upstream  at  locus  1 .  The  origin  mapping  ex  periments  are  being  carried  out  by  Dr. 
Michael  Foulk,  a  talented  postdoc  in  the  Gerbi  lab.  Due  to  t  he  fact  that  the  c-myc  origin 
of  replication  was  discovered  in  HeLa  c  ells,  we  fir  st  compared  the  enrichment  of 
nascent  strands  bet  ween  HeLa  cells  and  MC  F7  c  ells  in  order  to  confirm  that  the 
replication  origin  maps  to  the  same  posit  ion  at  the  c-myc  locus  in  MCF7  cells.  Our 
results  confirmed  that  this  was  indeed  the  ca  se:  In  HeLa  cells  ,  t  he  c-myc  origin  of 
replication  was  enrich  ed  about  12  fold,  while  in  the  M  CF7  cells,  it  was  enriched  about 
1 1  fold  when  we  used  the  DNA  nascent  str  and  isolation  protocol  above,  suggesting  a  n 
origin  of  replic  ation  exists  at  the  same  locus  in  MCF7  cells  (  Figure  2  ).  These 
experiments  also  dem  onstrated  the  feasibility  of  isolating  nascent  strands  from  MCF7 
cells  for  further  analysis.  Subsequently,  we  we  re  able  to  reliably  obtain  ~  1 00-1 50  ng 
nascent  strands  from  100  ug  starting  genomic  DNA  from  asynchronous  MCF7  cells. 
Real-time  PCR  showed  that  the  preps  were  enriched  for  the  c-myc  origin  between  11.0 
and  19.6  fold  (Figure  2). 

Several  pr  eparations  of  nascent  DNA  from  MCF7  cells  wer  e  pooled  and 
submitted  for  next  generation  sequencing.  Ben  Raph  ael  aligned  the  resulting  lllumina 
reads  to  the  human  genome  us  ing  MAQ  resulting  in  5.6  million  mapp  ed  reads.  We 
counted  the  number  of  reads  th  at  align  to  genomic  interval  s  defined  by  283  replication 
origins  identified  in  1%  of  the  human  genome  (Cadoret  et  al.,  2008;  ENCODE  project). 
For  each  of  these  283  intervals,  we  com  pared  the  read  count  of  the  int  erval  to  the 
expected  read  count  under  a  uniform  distribution  of  reads  to  intervals.  We  found  that  78 
of  the  283  origins  were  enriched  (P<1 0  '3)  for  nascent  strands  (  Figure  3  ;  Table  2  ). 
These  initial  results  were  encouraging,  and  suggest  that  our  nascent  DNA  preparation  is 
enriching  for  replication  origins.  Howeve  r,  our  sequencing  cov  erage  was  not  high 
enough  to  robustly  detect  new  replic  ation  orig  ins  and  there  was  some  contaminating 
bacterial  DNA. 

However,  these  experiments  proved  diffi  cult  to  reproduce.  We  determined  that 
the  cause  of  this  v  ariability  was  in  t  he  poor  qua  lity  of  the  p  reparation  of  lambda 
exonuclease  we  were  using  (our  previous  source  from  Invitrogen  had  been  discontinued 
so  we  had  switched  to  enzyme  from  New  England  BioLabs).  By  discussion  with  Drs. 
Mechali  and  Prioleau  whose  labs  are  in  France,  P.l.  Susan  Gerbi  lear  ned  that  the 
company  Fermentas  could  prepa  re  high  quality  lambda  ex  onuclease  by  s  pecial  order. 
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Therefore,  we  contracted  wit  h  Fermentas  to  obtain  a  high  quality,  high  concentration 
preparation  of  lambda  exonucleas  e.  The  original  nasc  ent  strand  protocol  was  modified 
so  that  the  phosphorylated  Rl  DNA  was  digested  with  240  units  of  lambda  exonuclease 
(versus  15  units  previously)  overnight  followi  ng  a  protocol  developed  by  Cadoret  et  al. 
(2008)  for  mapping  r  eplication  origins  in  the  ENCODE  subset  of  the  human  genome. 
Using  this  protocol  we  achiev  ed  about  20  fold  enrichment  of  the  c-myc  origin  in  MCF  7 
cells,  which  is  excellent.  We  then,  howev  er,  noticed  some  variability  in  nascent  strand 
enrichment  from  preparation  to  preparation.  We  did  c  ontrols  that  revealed  t  hat  the  pH 
optimum  for  the  Fermentas  recombinant  lam  bda  exon  uclease  is  broader  than  for  the 
previous  Invitrogen  purified  enzy  me.  At  the  previous  ly  used  higher  pH  of  8.  8  we  found 
that  there  was  degradation  of  RNA;  this  would  com  promise  the  integrity  of  the  RNA 
primers  on  the  nascent  DNA  and  render  the  nascent  DNA  susceptible  to  lambda 
exonuclease  digestion  ( Figure  4).  We  found  that  the  F  ermentas  preparation  of  lambda 
exonuclease  is  still  active  at  pH  8.0  and  that  RNA  degradation  does  not  occur  at  that  pH 
(Figure  4 ).  Additionally,  we  found  that  heating  the  samples  resulted  in  degrading  the 
RNA  s  o  we  have  m  odified  the  original  prot  ocol  to  eliminate  heating  st  eps  where 
possible  (Figure  4). 

For  the  lllumina  sequencing,  we  isolated  about  50-150  ng  nasce  nt  strands  from 
100  ug  starting  genomic  DNA.  Several  nascent  strand  preparations  were  pooled  (about 
500  ng  total)  and  s  ubsequently  amplified  at  the  Yale  University  sequencing  facility  for 
lllumina  sequencing.  However,  it  required  a  lot  of  effort  to  obtain  this  amount  of  nascent 
DNA.  Also,  PCR  artifacts  can  be  introduced  during  the  amplification  step.  To  overcome 
these  problems,  the  DO  D  IDEA  Extens  ion  grant  will  allow  us  to  try  Helicos  rather  than 
lllumina  sequencing.  The  first  report  of  Helic  os  sequencing  appeared  just  a  year  ago 
(Harris  et  al.  2008)  and  holds  much  promis  e  (Gupta  2008).  This  true  single  molecule 
sequencing  (tSMS)  approach  omits  the  necessity  for  DNA  am  plification,  signific  antly 
reducing  t  he  amount  of  nascent  DNA  star  ting  material  requir  ed,  about  10  ng.  In 
comparison,  the  lllumina  platform  for  comple  te  genome  coverage  requires  500-1000  ng 
nascent  DNA.  Moreover,  since  the  nascent  DNA  preparation  enriches  for  the  singl  e 
stranded  leading  strands  near  an  origin  of  replicatio  n  the  sequence  s  hould  map  to 
opposite  st  rands  on  either  side  of  the  origin.  This  data  will  provide  a  s  ignature  for 
authentic  origins  of  replicati  on  reducing  the  potential  f  or  call  ing  false  positiv  es.  There 
are  only  a  few  Helicos  machines  in  operation  world-wide.  We  have  been  given  access 
on  a  fee-for-service  basis  to  the  Helico  s  sequenc  ing  machine  at  the  Dana  Farber 
Cancer  Institute  wher  e  one  of  us  (Alex  Brodsky)  was  prev  iously  a  postdoc  prior  to 
joining  the  faculty  at  Brown  University.  We  intend  to  also  sequence  the  nascent  DNA 
using  lllumina  and  compare  t  he  results  between  the  two  pla  tforms.  We  anticipate  that 
there  will  b  e  -25,000-30, 000  replication  orig  ins  in  the  human  gen ome.  The  replic  ation 
origins,  once  mappe  d,  will  be  compared  to  estrogen  receptor  binding  sites  (the 
ENCODE  data  showed  a  c  orrelation  for  c-JUN  and  c-FOS  as  potential  re  gulators  of 
origins;  Cadoret  et  al.,  2008)  and  to  regions  of  DNA  amplification  in  breast  cancer  cells. 

To  sum  up,  we  have  accomplished  much  more  for  Statement  of  Work  Item  2  than 
originally  presented  in  the  gr  ant  proposal.  Instead  of  using  ChIP-chip  or  ChIP-Seq  to 
map  ORC  binding  sites,  we  have  refined  the  methodology  to  allow  direct  mapping  of  all 
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active  origins  in  the  breast  cancer  genome  by  isolation  of  short  nascent  strands  of  DNA 
and  sequencing  them.  Our  pilot  run  on  the  I  llumina  platform  was  successful.  Based  on 
this  progress,  we  are  grateful  to  have  received  a  DOD  IDEA  Extension  award. 

Statement  of  Work  Item  3:  Mapping  DNA  am  plification  sites  in  the  breast  cancer 
genome 

One  of  us  (Ben  Raphael,  co-  P.l.)  and  ot  hers  have  identified  chromosomal 
changes  in  the  genome  of  MC  F7  breast  cancer  cells  (Vol  ik  et  al.  2003  and  2006; 
Raphael  et  al.,  2008;  Hampton  et  al.,  2008),  including  sites  of  DNA  amplification. 
Therefore,  once  we  have  mappe  d  the  replication  origins  in  MCF  7  cells  (St  atement  of 
Work  Item  2),  we  can  directly  compare  their  locations  with  the  already  mapped  locations 
of  ER  binding  s  ites  (Statement  of  Work  It  em  1 )  and  DNA  amplific  ation  (Statement  of 
Work  Item  3),  as  per  our  specific  aims  for  this  grant.  If  we  e  xtend  the  study  to  breast 
cancer  tissue,  array  comparative  genom  e  hybridization  (aCGH)  will  be  used  t  o 
determine  the  sites  of  DNA  amplification  in  this  tissue. 

Months  1-6:  Refine  methods  to  analyze  DNA  amplification  (Raphael  lab) 

DNA  double  strand  breaks  have  been  shown  to  play  a  role  in  DNA  amplification  . 
As  stated  in  our  previous  progress  repor  t,  co-P.I.  B  en  Raphael  developed  a  nove  I 
method  called  Neighborhood  Break  point  Correlation  (NBC)  to  identify  correlated 
rearrangement  breakpoints  from  CGH  data  (  BMC  Bioinformatics,  in  revision).  Unlike 
previous  methods  for  aCGH  analysis  that  focus  on  finding  common  genomic  intervals  of 
amplification  or  deletion  t  hat  might  harbor  oncogenes  or  tumor  suppress  or  genes  , 
respectively,  NBC  focuses  on  the  precise  localization  of  the  boundaries  (breakpoints)  of 
these  intervals.  We  hypothesize  that  pairs  of  such  highly  cons  erved  breakpoints  might 
indicate  fusion  genes  or  other  common  rearrangements.  The  algorithm  employs  a 
statistical  model  derived  from  the  binomial  distribution  to  assess  the  statistical 
significance  of  breakpoints  that  shared  by  multiple  patients.  The  algorithm  also  identifies 
genes  or  pairs  of  genes  that  each  contains  one  or  more  breakpoints  in  a  statistically 
significant  number  of  patients. 

In  preliminary  analys  is,  Ben  Raphael  ex  amined  a  collection  of  36  primary 
prostate  tumors  for  breakpoints  in  the  well-known  TMPRSS2-ERG  fusion  gene  (Tomlins 
et  al.  2005).  He  applied  NBC  to  identify  changes  in  c  opy  number  (breakpoints)  in  each 
patient  and  then  identified  comm  on  breakpoints  that  appear  in  a  statistically  significant 
number  of  patients.  Specifically,  he  identified  12  statistically  significant  rearrangements, 
one  of  which  is  the  T  MPRSS2-ERG  fusion  gene.  It  is  detected  in  5  patients  with  a  p- 
value  of  2.  7x1 0A1 0  ( Figure  5 ).  In  a  larger  analys  is,  he  ex  amined  a  c  ollection  of  data 
from  233  patients  with  glioblastoma,  in  eluding  227  primary  tumor  samples  and  107 
matched  blood  samples  from  The  Cancer  Genom  e  Atlas.  He  predi  cted  93  statistically 
significant  rearrangements  that  are  further  classified  as  gene  truncations,  germline 
structural  variants,  and  fusion  genes.  The  pow  er  of  his  method  to  detect  correlated 
breakpoints  increases  with  larger  sets  of  pat  ients.  We  will  apply  these  methods  to  the 
aCGH  data  that  will  be  generated  in  the  present  research  project  on  DNA  amplification 
in  breast  cancer  cells.  This  will  allow  us  to  uncover  additional  candidate  fusion  genes  or 
regulatory  fusions,  particularly  fusions  near  ER  binding  sites. 
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We  now  report  here  some  new  data  derived  by  computational  analysis  by  co-P.I. 
Ben  Raphael.  He  c  ombined  aCGH  data  and  ES  P  data  from  estrogen  receptor  (ER) 
binding  data  in  MCF7  breast  c  ancer  cells  determined  initia  lly  by  co-P.I.  Alex  Brodsky 
using  chromatin  immunoprecipit  ation  (ChIP-chip)  (Carroll  et  al.  2006).  The  original 
ChIP-chip  study  (Carroll  et  al.,  2006)  interpreted  the  data  in  the  context  of  the  reference 
human  genome,  even  though  it  is  well  known  that  MCF7  exhibits  extens  ive  genomic 
aberrations  including  copy  number  changes  and  structural  rearrangements.  We 
examined  how  k  nowledge  of  t  hese  genomic  cha  nges  affects  the  interpretation  of  the 
ChIP-chip  data.  Using  scan  stat  istics  (Glaz  et  al.,  2001 ),  we  identified  regions  of  the 
reference  genome  that  contained  si  gnificantly  few  or  significantly  many  ER  sites.  We 
found  38  gaps,  defined  as  sequenced  genomic  r  egions  >  6.9  Mb  with  no  ER  bindin  g 
sites.  Under  the  null  hypothes  is  of  ER  sites  distributed  uniformly  on  the  genome,  the 
probability  of  finding  o  ne  such  gap  is  <  10  '4.  The  copy  numbers  of  the  probes  in  these 
gaps  determined  using  Agilent  44K  array  CGH  had  mean  log2-ratio  of  -0.28,  significantly 
below  the  mean  value  of  -0.05  over  all  probes  (p-value  by  T-test  <  10  '10°)  and  implying 
that  the  gaps  in  ER  binding  sites  are  a  result  of  deletions  in  the  MCF-7  genome. 

We  next  identified  unusual  clust  ers  of  ER  bind  ing  sites  in  the  ChIP-chip  data, 
where  a  c  luster  was  defined  as  1 2  or  more  ER  sites  in  a  1  Mb  region  of  the  genome. 
Under  the  null  hypothesis,  the  probability  of  such  a  cluster  is  <  6  x  1 0'5,  but  we  identified 
1 1  clusters  in  the  dat  a.  The  mean  log  2-ratio  in  these  segments  is  1 .3,  suggesting  that 
clusters  of  ER  binding  sites  are  preferentially  found  in  amplified  regions.  This  could  be 
due  to  the  fact  that  the  ChIP-chip  assay  has  higher  sensitivity  in  amplified  regions,  or 
due  to  the  fact  that  the  stat  istical  model  used  to  c  all  bind  ing  sites  is  imprecise  in 
amplified  r  egions.  To  assess  whether  ther  e  might  be  amplification  of  the  regions 
harboring  ER  clusters,  we  examined  BAC  array  CGH  data  on  51  breast  cancer  cell  lines 
from  (Neve  et  al.,  2006).  We  found  that  one  of  the  1 1  ER  c  lusters  is  preferentially 
amplified  in  ER-pos  itive  ce  II  lines  and  preferent  ially  deleted  in  ER-negative  breas  t 
cancer  cell  lines  (  Figure  6 ).  Moreover,  we  found  differentia  I  expression  of  one  of  the 
four  genes  in  this  region  using  Oncomine  (R  hodes  et  al.,  2007).  TLE3  is  significantly 
under-expressed  in  ER-negative  breast  tumors  compared  to  ER-positive  breast  tumors 
(Sotiriou  et  al.,  2003;  Minn  et  al.,  2005;  Wang  et  al.,  2005;  Hess  et  al.,  2006)  (p-values 
1 0'12,  2x10  "11, 2x1  O'10,  8x  10'9  b  ased  o  n  t-test).  This  result  is  consisten  t  with  the 
amplification  of  this  region  as  determined  by  array  CGH,  and  als  o  suggests  estrogen 
dependent  regulation  of  this  gene. 

We  also  looked  for  structural  rearrangem  ents  that  might  yield  regulatory  fusion  s 
between  bound  ER  s  ites  in  the  ChIP-chip  data  and  genes  identified  as  differentially 
regulated  in  response  to  estrogen.  Using  ESP  dat  a  to  identify  rearrangements  (Volik  et 
al.,  2006;  Raphael  et  al.,  2008),  we  found  27  examples  of  such  candidate  fusions  (P  <  2 
x  10'11  by  a  permutation  test.).  Moreover,  i  n  1 1/27  cases,  the  genes  involv  ed  in  these 
putative  fusions  hav  e  no  bound  ER  s  ite  within  100  k  b  of  the  transcription  start  site. 
Several  of  these  genes  are  im  plicated  in  breast  cancer  including  BRCC3  (a  subunit  in 
the  BRCA1/2  containing  protein  complex),  PTK6,  and  STK6. 
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Months  21-24:  Whole  genome  SNP  arrays  (Brodsky  and  Gerbi  labs) 

Having  shifted  from  tumor  specimens  to  MCF7  breast  cancer  cells,  SNP  arrays 
were  no  longer  needed. 

To  sum  up,  we  hav  e  done  more  work  in  the  Stat  ement  of  Work  Item  3  than 
originally  described  in  the  grant  application,  havi  ng  developed  many  tools  for 
computational  analysis. 

Concluding  remarks  -  The  recent  finding  that  the  tr  anscription  factor  c-Myc  interacts 
with  the  pre-replication  complex  to  contra  I  DNA  re  plication  (  Dominguez-Sola  et  al ., 
2007;  Lebofsky  and  Walter,  2007)  and  that  t  he  androgen  receptor  interacts  with  MCM7 
of  the  pre-replication  complex  (Shi,  2008)  provides  prec  edence  for  our  hypot  hesis  that 
the  ligand-bound  estrogen  receptor  may  play  a  dir  ect  role  in  regulating  replication 
origins  beyond  its  traditional  ro  le  as  a  transcription  factor.  We  are  grateful  for  the  DOD 
funding  that  allowed  us  to  initiate  experimen  ts  to  test  our  hypothesis  and  loo  k  forward 
eagerly  to  results  from  the  DOD  funded  IDEA  Extension  award. 

KEY  RESEARCH  ACCOMPLISHMENTS 

•  Further  refinement  in  the  method  to  isolate  nascent  (newly  replicated)  DNA, 
lowering  the  pH  to  prevent  RNA  degr  adation  during  lambda  exonuclease 
digestion,  thereby  reducing  the  prep-to-prep  variability  in  origin  enrichment. 

•  PCR  mapping  of  the  myc  rep  lication  origin  showed  that  it  is  located  in  the  same 
position  in  HeLa  and  MCF7  cells. 

•  A  trial  run  of  lllumina  sequenc  ing  of  nascent  strands  included  many  of  the 
replication  origins  previously  reported  for  1%  of  the  human  genome 

•  Improvement  of  the  methodology  for  analys  is  of  aCGH  data  to  identify  common 
aberrations  and  common  breakpoints. 

•  Computational  analysis  suggested  that  clusters  of  ER  binding  sites  are 

preferentially  found  in  amplified  regions. 

REPORTABLE  OUTCOMES 

•  Method  to  isolate  nascent  (newly  replicated)  DNA 

•  Preliminary  data  mapping  replication  origins  in  the  breast  cancer  genome 

•  Methodology  for  analysis  of  aCGH  dat  a  to  identify  common  aberrations  and 
common  breakpoints. 

•  Computational  results  s  uggesting  that  c  lusters  of  ER  bind  ing  s  ites  are 
preferentially  found  in  amplified  regions. 


12 


We  anticipate  writing  a  paper  for  public  ation  in  a  high  profile  journal  describing  the 
map  of  replication  or  igins  in  the  entire  hum  an  genome  (from  breast  can  cer  cells) 
once  our  data  are  complete. 

Based  on  our  successful  upward  trajecto  ry  with  these  experiments,  we  have 
received  a  DOD  Idea  Expansion  aw  ard.  This  award  would  allow  us  to  complete  and 
expand  our  promising  experiments. 

CONCLUSION 

Recent  publications  cited  in  this  progress  report  suppor  t  our  hypothesis  tha  t  the 
estrogen  receptor  may  interact  with  the  replication  machinery  and  promote  DNA 
amplification  in  breast  cancer  cells.  We  hav  e  improved  the  experimental  protocol  from 
what  was  initially  approved  in  this  grant.  In  stead  of  identifying  origins  by  ORC  ChIP,  we 
are  isolating  size-fractionat  ed  nascent  strands  to  us  e  them  for  next  generation 
sequencing.  Our  results  will  be  the  first  to  m  ap  replication  origin  s  on  the  entire  human 
genome.  The  data  will  be  compared  to  map  positions  of  ER  binding  sites  in  the  genome 
and  regio  ns  of  DNA  amplific  ation.  A  pos  itive  correlati  on  will  directly  su  pport  our 
hypothesis  and  will  provide  a  new  way  of  thin  king  about  the  role  of  steroid  hormones  in 
cancer.  The  results  will  begin  to  eluci  date  the  mechanism  of  induction  of  DNA 
amplification  and  could  provide  a  platform  for  new  methods  of  diagnosis  and  treatment 
of  breast  cancer. 

Personnel  paid  from  this  grant: 

(nb  -  these  are  personnel  in  the  three  different  groups  of  PI  and  co-PIs;  some  of  the  lab 
personnel  only  worked  briefly  on  this  project) 

PI:  Susan  A.  Gerbi 

Co-P.I.s:  Alexander  Brodsky,  Benjamin  Raphael 
Postdoctorals:  Michael  Foulk,  Yutaka  Yamamoto 
Graduate  Students:  Crystal  Kahn,  Anna  Ritz 

Research  Assistants:  Jacob  Bliss,  Megan  Frayne,  Mark  Gr  abiner,  Sara  Hillenmyer, 
Ingrid  Mercer,  Shellee  Morehead,  Hannah  Sanford,  Heidi  Smith 

Undergraduate  Dishwashers:  Carolyn  Crisp,  Sydney  Ember,  Emily  Hartman,  Theeradej 
Thaweerattanasinp 
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APPENDICES 


Appendix  1:  Meeting  Abstracts 

Meeting  abstract:  DOD  2008  Era  of  Hope  meeting  (Baltimore,  MD) 

Hormonal  Involvement  in  Breast  Cancer  Gene  Amplification 

Michael  S.  Foulk1*,  Sara  Hillenmeyer*1,  Alexander  S.  Brodsky1,  Benjamin  J  .  Raphael1, 
Shamlal  Mangray2,  Theresa  Graves2  and  Susan  A.  Gerbi1 

1  Brown  University,  Providence,  Rl  02912 

2  Lifespan  (Rl  Hospital),  Providence,  Rl  02903 
*  equal  contribution 

Genetic  in  stability  a  nd  r  earrangements,  includ  ing  g  ene  amplif  ication,  is  a 
hallmark  of  cancer.  Amplification  of  the  HER2  (ErbB2/Neu)  gene  occurs  in  invasiv  e 
breast  cancer  (-25%)  and  in  ductal  carcinoma  in  situ  (50-60%).  HER2  amplification  and 
concomitant  over-expression  of  this  growth  fa  ctor  promotes  cancer  cell  growth,  acting 
as  a  metastasis-promoting  factor.  It  w  ould  be  desirable  to  prevent  HER2  gene 
amplification,  thereby  moderating  the  aggressive  growth  of  breast  cancer  cells.  The 
problem  is  that  no  one  knows  what  triggers  gene  amplificat  ion.  Our  recent  research 
suggested  that  the  trigger  may  be  the  steroid  horm  one  estrogen.  Do  genetic  or 
epigenetic  changes  produce  nov  el  binding  site(s  )  for  the  estrogen  receptor  (ER)  near 
the  HER2  replication  origin,  inducing  gene  amplificatio  n?  Our  hypothesis  is  that  ER 
interacts  with  the  replication  machinery  to  drive  re-replic  ation  of  the  HER2  locu  s, 
resulting  in  DNA  amplification. 

Our  specific  aims  and  the  study  design  are: 

(1)  Map  E  R  binding  sites  in  surgically  derived  HE  R2  amplif  ied  breast  cancer 
tissue,  using  chromatin  immunoprecipitation  (ChIP)  with  an  antibody  against  ER.  The 
immunoprecipitated  DNA  will  be  used  as  a  probe  for  DNA  microarray  chips  ("ChlP- 
chip")  to  screen  the  human  genome  for  hormone  receptor  binding  sites.  We  will  look  fo  r 
differences  in  ER  binding  sites  between  c  ancer  cells  and  non-  cancer  cells  from  the 
same  patient.  The  positive  ca  ndidates  will  be  conf  irmed  by  quantitative  PCR  following 
ChIP. 


(2)  Map  replic  ation  origins  us  ing  shor  t  nascent  strands  as  probes  for  DNA 
microarray  chips.  Data  analysis  will  identify  replication  origins  that  are  near  ER  b  inding 
sites,  with  special  attention  giv  en  to  novel  ER  sites  in  the  cancer  genome.  An  alternate 
and/or  confirmatory  approach  is  sequential  ChIP  ("re-ChIP")  on  chip  experiments  where 
DNA  is  immunoprecipitated  by  antibodies  against  ER  and  aga  inst  Origin  Recognition 
Complex  polypeptide  2  (ORC2),  thereby  pulling  down  DN  A  fragments  bound  by  both 
antigens. 

(3)  Quantify  level  of  HER2  am  plification  and  identify  site  s  of  co-amplification  in 
the  genome.  DNA  will  be  is  olated  from  the  same  tissue  samples  used  for  specific  aims 
(1 )  and  (2)  for  use  as  probes  for  whole  genome  SNP  arrays  to  quantify  gene  copy 
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numbers,  thereby  identifying  r  egions  of  amplification.  The  level  of  HER2  gen  e 
amplification  will  be  quantified  and  any  sites  of  co-amplification  will  be  determined.  This 
study  will  examine  if  a  correlation  exists  between  ER  binding  at  novel  sites  in  the  breast 
cancer  genome  and  juxtapos  ition  with  put  ative  replication  origins  that  esc  ape  normal 
cellular  controls  and  re-replicate,  leading  to  DNA  amplification.  This  may  provide  a  new 
paradigm  for  hormonal  induction  of  breast  cancer  via  gene  amplification,  leading  to  new 
methods  of  diagnosis  and  treatm  ent.  Our  results  will  indicate  if  there  are  other  regions 
that  co-amplify  with  t  he  HER2  locus  in  the  ER  positive,  HER2  amplified  br  east  cancer 
patient  samples.  Other  co-amp  lified  genes,  within  the  HER2  amplicon  and/or  at  other 
regions,  could  serve  as  additional  novel  target  s  for  therapies  s  imilar  to  the  approach  of 
using  Herceptin  to  target  HER2. 
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Figures  1-6 
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SUPPORTING  DATA 


Figure  1: 

Hyur  jonici .hor\ 


Figures 


Figure  1  :  Rep  resentative  gel  sho  wing  go  od  shearing  of 
genomic  DNA  from  an  ER+  b  reast  tumor  sampl  e  usin  g  the 
Bioruptor.  The  largest  signal  observed  is  in  the  range  of  500  bp. 
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Figure  2: 


Flow  Chart  for  Preparing  Nascent  DNA 
to  Hybridize  to  Microarrays 
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Adapted  from  Tao  et  al.,  JCB  78: 442-57  (2000). 


Figure  2:  Flow  chart  of  the  nascent  strand  preparation  protocol.  At  the  bottom  is  a  cartoon  of  the  c- 
myc  locus.  The  arrows  indicate  the  targets  of  the  real-time  PCR  primers.  At  right  is  real-time  PCR 
data  showing  enrichment  of  the  c-myc  origin  in  both  HeLa  and  MCF7  cells  (top)  and  the  maintenance 
of  enrichment  after  WGA  amplification  of  ht  nascent  strands  (bottom). 
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Figure  3: 


Figure  3:  Chart  of  lllumina  reads  from  MCF7  cell  nascent  strands  mapped  to  the  283  origins  identified 
by  Cadoret  et  a!.,  2008.  The  red  line  indicates  the  cut  off  for  statistical  significance. 
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Figure  4: 


-  Id 


1  I  <L  AyvL 


Figure  4:  G  els  from  the  notebook  showing  that  the  Fermentas  lambda  exo nuclease  is  active  at  pH  8.0 
and  that  th  e  rRNA  is  mo  re  stabl  e  at  t  his  p  H  over  t  he  cou  rse  of  the  dig  estion.  5  ug  of  DNA/RNA  was 
digested  at  t  he  indi  cated  pH  a nd  for  t  he  time  indi  cated.  In  the  middle  g  el,  the  un  digested  sam  pie  was 
loaded  in  two  adjacent  lanes  with  one  receiving  90%  of  the  sample  and  the  other  the  remaining  10%.  The 
gel  on  the  right  shows  that  heating  the  sample  to  75°  C  degrades  the  rRNA  despite  the  pH  (75°  lanes). 
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Figure  5: 


Breakpoint  in  5  Patients 
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Figure  5:  We  identify  the  TMPRSS2-ERG  fusion  gene  in  5  prostate  cancer  patients.  The  segmentations 
for  each  patient  are  shown  in  blue,  and  the  asterisks  denote  probe  locations.  The  deletion  fuses  the  5'  end 
of  TMPRSS2  to  the  3'  end  of  ERG,  and  the  relative  copy  number  at  these  breakpoints  is  conserved  across 
the  deletion. 
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Figure  6: 
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Figure  6:  A  statistica  lly  significant  cluster  of  15  ER  bind  ing  sites  (p  urple  lines,  top) 
identified  on  Chromosome  15. 
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TABLE  1:  Log  of  Breast  Cancer  Tumor  Samples  Received  from  R.l.  Hospital 

All  tumor  specimens  were  provided  by  Dr.  Shamlal  Mangray  (Pathology  Department,  Rl 
Hospital)  and  were  frozen  at  -80  degrees.  All  samples  were  from  female  patients 
(identity  unknown  -  coded  by  the  Pathol  ogy  Department)  without  neoadjuvent 
chemotherapy.  Most  were  patients  of  Dr.  Thersa  Graves.  The  samples  were  1 .0-1 .5  cm. 


#  code 

Date  ER 

PR  HER2 

Age 

Comments 

33 

1 

10/25/07 

pos 

used 

for  H4  testl 

34 

2 

11/13/07 

also 

normal  tissue 

35 

3 

12/17/07 

pos 

pos 

neg 

also 

normal  tissue 

36 

Path 

4  1/18/08 

#  213A/J 

post-menopausal(PM) 

normal  tissue  (tube  J) 

37  5 

38 

Path 

6  1/18/08 
#  375C/F  also 

45 

used:C=H4  test  (1  g) 
normal  tissue  (tube  F) 

39  7 

40 

8 

1/28/08 

pos  pos  neg 

60(PM) 

also  normal  tissue 

41  9 

42 

10  (SG5)  1/28/08 

pos 

pos 

neg 

49  used:  H4/ER/mock(0.2  g) 

also  normal  tissue 

43  11 

44 

12  (SG6)  1/28/08 

pos 

pos 

neg 

41 

also  normal  tissue 

3 

SG8 

9/09 

3+ 

2+ 

Neg 

73 

8  cm  tumor, 
lymph  node  mets 

5  SG10 

(5-10%) 

9/09 

3+ 

1  + 

Neg 

28  4  cm  tumor, 

sentinel  lymph  node  (SLN)  micromets 

6 

SG1 1 

9/09  3+ 

3+ 

Neg 

54  2.5 

cm  tumor, 
no  mets  to  SLN 

8 

SG13 

9/09 

Neg  Neg  Neg 

53 

1 .7  cm  tumor, 

26 


no  mets  to  SLN 


9 

SG14 

9/09 

3+ 

3+ 

2+(FISH  neg)  33 

4  cm  tumor 
no  mets  to  SLN 

10 

SG15 

No 

9/09 

3+ 

3+ 

2+(FISH  neg)  66  2.4  cm  tumor, 

SLN  sampling,  FNA  of  node  negative 

11 

SG16 

9/09 

Neg 

Neg 

2+(FISH  neg)  80 

1 .1  cm  tumor, 
no  mets  to  SLN 

12 

SG17 

Axillary 

9/09 

3+ 

Neg 

2+(FISH  neg)  69  2.5  cm  tumor, 

lymph  node  (ALN)  negative 

13 

SG18 

9/09 

3+ 

3+ 

FISH  neg  59 

1 .3  cm  tumor, 
no  mets  to  SLN 

4 

19 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

5 

20 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

6 

21 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

7 

22 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

8 

25 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

9 

26 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

10 

27 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

11 

28 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

12 

29 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

13 

30 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

14 

31 

11/8/10 

pos 

pos 

neg 

also 

normal  tissue 

27 


TABLE  2:  Spreadsheet  of  MCF7  nasc  ent  DNA  lllumina  re  ads  mapped  to  the 
ENCODE  data  set. 


A 

B 

C 

D 

E 

F 

G 

H 

1 

1 

Origi 
n  - 

Chro 

m 

Origin 

Start 

Origin 

End 

Num 

Rea 

ds 

Reads/ 

kb 

pval 

(Poisson) 

Corrected 

pval 

Mapp 

ed 

Mapped/ 

kb 

2 

chrl 

1 

6413111 

0 

6413298 
7  37 

19.712 
31  0 

0 

26510 
49  0.94 

680 

3 

chrl 

1 

6428967 

4 
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22.515 

76 

0 

0 

4 
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6432689 

7 
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0 

0 
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0 

0 
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3 
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65 
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31  27 
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74 

0 
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17.014 

69 
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0.01676 

90 

chr9 

1312141 

69 

1312164 
97  17 

7.3024 

1 

0.00006 

0.01676 

91 

chrl 

3 

1128112 

87 

1128125 
98  9 

6.8649 

9 

0.00045 

0.12623 

1* 

chrl 

4 

9888185 

9 

9888320 

29 

6.7014 

1 

0.00045 
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